The Asymptotic Convergence-Rate of Q-learning

نویسنده

  • Csaba Szepesvári
چکیده

In this paper we show that for discounted MDPs with discount factor, > 1/2 the asymptotic rate of convergence of Q-Iearning is O(1/tR(1-1') if R(1 ,) < 1/2 and O( Jlog log tit) otherwise provided that the state-action pairs are sampled from a fixed probability distribution. Here R = Pmin/Pmax is the ratio of the minimum and maximum state-action occupation frequencies. The results extend to convergent on-line learning provided that Pmin > 0, where Pmin and Pmax now become the minimum and maximum state-action occupation frequencies corresponding to the stationary distribution.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

From Q( ) to Average Q-learning: Efficient Implementation of an Asymptotic Approximation

Q( ) is a reinforcement learning algorithm that combines Q-learning and TD( ). Online implementations of Q( ) that use eligibility traces have been shown to speed basic Q-learning. In this paper we present an asymptotic analysis of Watkins’ Q( ) with accumulative eligibility traces. We first introduce an asymptotic approximation of Q( ) that appears to be a gain matrix variant of basic Qlearnin...

متن کامل

Asymptotic behavior of a system of two difference equations of exponential form

In this paper, we study the boundedness and persistence of the solutions, the global stability of the unique positive equilibrium point and the rate of convergence of a solution that converges to the equilibrium $E=(bar{x}, bar{y})$ of the system of two difference equations of exponential form: begin{equation*} x_{n+1}=dfrac{a+e^{-(bx_n+cy_n)}}{d+bx_n+cy_n}, y_{n+1}=dfrac{a+e^{-(by_n+cx_n)}}{d+...

متن کامل

Superlinearly convergent exact penalty projected structured Hessian updating schemes for constrained nonlinear least squares: asymptotic analysis

We present a structured algorithm for solving constrained nonlinear least squares problems, and establish its local two-step Q-superlinear convergence. The approach is based on an adaptive structured scheme due to Mahdavi-Amiri and Bartels of the exact penalty method of Coleman and Conn for nonlinearly constrained optimization problems. The structured adaptation also makes use of the ideas of N...

متن کامل

On the approximation by Chlodowsky type generalization of (p,q)-Bernstein operators

In the present article, we introduce Chlodowsky variant of $(p,q)$-Bernstein operators and compute the moments for these operators which are used in proving our main results. Further, we study some approximation properties of these new operators, which include the rate of convergence using usual modulus of continuity and also the rate of convergence when the function $f$ belongs to the class Li...

متن کامل

Two Novel Learning Algorithms for CMAC Neural Network Based on Changeable Learning Rate

Cerebellar Model Articulation Controller Neural Network is a computational model of cerebellum which acts as a lookup table. The advantages of CMAC are fast learning convergence, and capability of mapping nonlinear functions due to its local generalization of weight updating, single structure and easy processing. In the training phase, the disadvantage of some CMAC models is unstable phenomenon...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1997